Pattern Mining with Natural Language Processing: An Exploratory Approach

نویسندگان

  • Ana Cristina Mendes
  • Cláudia Antunes
چکیده

Pattern mining derives from the need of discovering hidden knowledge in very large amounts of data, regardless of the form in which it is presented. When it comes to Natural Language Processing (NLP), it arose along the humans’ necessity of being understood by computers. In this paper we present an exploratory approach that aims at bringing together the best of both worlds. Our goal is to discover patterns in linguistically processed texts, through the usage of NLP state-of-the-art tools and traditional pattern mining algorithms. Articles from a Portuguese newspaper are the input of a series of tests described in this paper. First, they are processed by an NLP chain, which performs a deep linguistic analysis of text; afterwards, pattern mining algorithms Apriori and GenPrefixSpan are used. Results showed the applicability of pattern mining techniques in textual structured data, and also provided several evidences about the structure of the language.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A constraint-based querying system for exploratory pattern discovery

In this article we present ConQueSt, a constraint based querying system able to support the intrinsically exploratory (i.e., human-guided, interactive, iterative) nature of pattern discovery. Following the inductive database vision, our framework provides users with an expressive constraint based query language, which allows the discovery process to be effectively driven toward potentially inte...

متن کامل

Ontology generation for large email collections

This paper presents a new approach to identifying concepts expressed in a collection of email messages, and organizing them into an ontology or taxonomy for browsing. It incorporates techniques from text mining, information retrieval, natural language processing and machine learning to generate a concept ontology. Nominal N-gram mining is used to identify candidate concepts. Wordnet and surface...

متن کامل

Exploratory Text Analysis using Lexical Episode Plots

In this paper, we present Lexical Episode Plots, a novel automated text-mining and visual analytics approach for exploratory text analysis. In particular, we first describe an algorithm for automatically annotating text regions to examine prominent themes within natural language texts. The algorithm is based on lexical chaining to find spans of text in which the frequency of a term is significa...

متن کامل

Maytag: A Multi-Staged Approach to Identifying Complex Events in Textual Data

We present a novel application of NLP and text mining to the analysis of financial documents. In particular, we describe an implemented prototype, Maytag, which combines information extraction and subject classification tools in an interactive exploratory framework. We present experimental results on their performance, as tailored to the financial domain, and some forward-looking extensions to ...

متن کامل

Text Mining in Analyzing the Presentation of Educational Trainers

This work deals with Text analysis that involves information retrieval through lexical analysis to learn word occurrence and distributions, pattern recognition, information extraction, data mining techniques and followed by visualization, and predictive analytics. The primary goal is to turn text into data for analysis, through application of natural language processing (NLP) and analytical too...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009